-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add autograd profiler #1693
add autograd profiler #1693
Conversation
Hello @quinor! Thanks for updating this PR.
Comment last updated at 2020-12-02 23:27:31 UTC |
@quinor what do need help with? @jeremyjordan may you have look? |
Tests, docs and thread safety + (of course) a review. Please see the list at the end of the top message. |
@Borda would you mind looking into the thread safety? |
This pull request is now in conflict... :( |
what do you mean by thread-safety, in python unless it is spec by no-GIL locker it is always safe... |
Quoted from my initial PR message: "thread-safety (the profiler may fail if it's ran in multiple threads, ie. when there's more than one dataloader worker the profiler can be entered with the same action_name multiple times at once and that doesn't end well)" Basically if someone does:
my profiler does break. There are 3 options I see right now:
|
Hey @quinor, do you still need help with this? Mind taking a look? |
@edenlightning please see my last message. I'm not deep enough into lightning to know how to fix the issues I had pinpointed and I don't think it can be merged without fixing them, so help with that would really be appreciated. I thought @Borda was going to look into that. |
31459eb
to
386a8a4
Compare
@Borda keep this PR? |
Just came across this linked in the Slack - this would be very cool functionality if it could end up merged. |
feel free to work on it, I'm not deep enough in lightning to fix the issues I pinpointed :/ |
I'm not either unfortunately, but the issue with the profiler and num_workers > 0 seems like it may have deeper roots: pytorch/pytorch#6313 |
386a8a4
to
dc8cab4
Compare
I 'll have a look just when we fix master so we can easily debug the optional issue here... |
Codecov Report
@@ Coverage Diff @@
## master #1693 +/- ##
======================================
- Coverage 93% 93% -0%
======================================
Files 124 124
Lines 9303 9355 +52
======================================
+ Hits 8616 8662 +46
- Misses 687 693 +6 |
dc8cab4
to
23280ec
Compare
@quinor how is it going, do you this we gen this done by end of this week to pass to 1.1? 🐰 |
@Borda I'll try to get the code ready on wednesday + hopefully some docs. I hope we can then close it by the end of the week. |
8ccd6d5
to
7671663
Compare
b237677
to
e68f63e
Compare
I see some tests failing for pytorch 1.3 (since old pytorch's |
@Borda it's ready from my side, some issues (tests failing due to old pytorch) still pending that I couldn't resolve by myself. Do you still want to push it into 1.1? |
closing, #5560 implements this profiler. |
Before submitting
What does this PR do?
Adds a profiler based on
torch.autograd.profile.profiler
. It's different from the existing profilers as it allows to profile impact of specific pytorch ops (including C/C++ and CUDA code) compared to just the python code.Closes #3917.
PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.
@jeremyjordan
Did you have fun?
Make sure you had fun coding 🙃